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Abstract 



Phylogenctic mixtures model the inhomogeneous molecular evolution 
commonly observed in data. The performance of phylogenetic reconstruction 
methods where the underlying data is generated by a mixture model has 
stimulated considerable recent debate. Much of the controversy stems from 
simulations of mixture model data on a given tree topology for which 
reconstruction algorithms output a tree of a different topology; these findings 
were held up to show the shortcomings of particular tree reconstruction 
methods. In so doing, the underlying assumption was that mixture model data 
on one topology can be distinguished from data evolved on an unmixed tree of 
another topology given enough data and the "correct" method. Here we show 
that this assumption can be false. For biologists our results imply that, for 
example, the combined data from two genes whose phylogenctic trees differ 
only in terms of branch lengths can perfectly fit a tree of a different topology. 
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It is now well known that mole cular evolution is h eterogeneous, i.e. 
that it varies across time and position (pinion et al. . Il996h . A classic example 
is stems and loops of ribosomal RNA: the evolution of one side of a stem is 
strongly constrained to m atch the complementary side, whereas for loops 
different constraints exist (jSpringer and Douzervi . Il996l ) . Heterogeneous 
evolution between genes is also widespread, where even the g eneral features of 



evolu tionary history for neighboring genes may differ wildly (jOchman et al 



2000). Presently it is not uncommon t o use concatenated s equence data from 



many genes for phylogenetic inference ( Phillips et al. . 2004) . wh ich can lead to 



very high levels of apparent heterogeneity (jBaldauf et al. , 2000). Furthermore 



empirical evidence using the covarion model shows that sometimes more subtle 
partitions of the da ta can exist, for which separate analysis is difficult 



( Wang et all . 120071 ). 



This heterogene ity is typically formulated as a mixture model 

( Pagel and Meadd . l200i . Mathematically, a phylogenetic mixture model is 
simply a weighted average of site pattern frequencies derived from a number of 
phylogenetic trees, which may be of the same or different topologies. Even 
though many phylogenetics programs accept aligned sequences as input, the 
only data actually used in the vast majority of phylogenetic algorithms is the 
derived site pattern frequencies. Thus, in these algorithms, any record of 
position is lost and heterogeneous evolution appears identical to homogeneous 
evolution under an appropriate phylogenetic mixture model. For simplicity, we 
call a mixture of site pattern frequencies from two trees (which may be of the 
same or different topology) a mixture of two trees; when the two trees have the 
same underlying topology, the mixture will be called a mixture of branch 
length sets on a tree. 

Mixture models have proven difficult for phylogenetic reconstruction 
methods, which have historically sought to find a single process explaining the 
data. For example, it has been shown that mixtures of tw o different tree 
topol ogies can mislead MCMC-based tree reconstruction ([Mossel and Vigoda . 
2005). It is also known that there exist mixtures of branch length sets on one 



tree which are indistin guishable from mixtures of branch length sets on a tree 
of a different topology ( Steel et al. . 1994 ; Stefankovic and Vigodal . l2007al lb"). 
Recently, simulations of mixture models from "heterotachous" (changing rates 
thro ugh time) evolution have been shown to cause reconstruction methods to 
fail (jRuano-Rubio and Fares! 120071 ). 

The motivation for our work is the observation that both theory and 
simulations have shown that in certain parameter regimes, phylogenetic 
reconstruction methods return a tree topology different from the one used to 
generate the mixture data. The parameter regime in this class of examples is 
similar to that shown in Figure 1, with two neighboring pendant edges which 
alternate being long and short. After mixing and reconstruction, these edges 
may no longer be adjacent on the reconstructed tree. We call this mixed 
branch rep ulsion. This phenomenon has been observed extensively i n 



simulation dKolaczkowski and Thornton . 2004 Spencer et al. , 2005; 



Philippe et al. L 120051 : iGadagkar and KumarL l2005h and it has been proved that 
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certain distance and maximum likelihood met hods are susceptible to this effect 
( Chanel [l99l IStefankovic and Vigodal l2007aflhh . Up to this point such results 
have been interpreted as pathological behavior of the reconstruction 
algorithms, which has led to a heated d ebate about which reconstruction 
methods perform best in thi s situa tion (jSteell , 120051 

Thornton and Kolaczkowski , 2005f h Implicit in this debate is the assumption 
that a mixture of trees on one topology gives different site pattern frequencies 
than that of an unmixed tree of a different topology. This leads to the natural 
question of how similar these two site pattern frequencies can be. 

Here we demonstrate that mixtures of two sets of branch lengths on a 
tree of one topology can exactly mimic the site pattern frequencies of a tree of 
a different topology under the two-state symmetric model. In fact, there is a 
precisely characterizable (codimension two) region of parameter space where 
such mixtures exist. Consider two quartet trees of topology 12|34, as shown in 
Figure 1. Label the pendant branches 1 through 4 according to the taxon 
labels, and label the internal edge with 5. The first branch length set will be 
written ti,...,ts and the second si, . . . , 35. Now, if k±, . . . , ki satisfy the 
following system of inequalities 



ki > k 3 > fc 4 > 1 > k 2 

1 k -1 1 — k a 1 1 kr* 1 — fe 1 



t£4 
+&4 



k 2 k, 



> 0, 



l + fclfc 4 l + fe 2 fe3 



> 1 



then they specify a class of examples of mixed branch repulsion. More 
precisely, then there exist nonzero internal branch lengths t§ and S5, mixing 
weights, and positive numbers t\, . . . ,£4, such that if for i = 1, . . . , 4, 
ki = exp (— 2(ij — Si)) and ti > the corresponding mixture of two 12 134 
trees will have the same site pattern frequencies as a single tree of the 13 1 24 
topology. We have illustrated two examples of branch length sets satisfying 
these criteria in Figure 1 and provided the corresponding branch lengths in 
Table 1. 

The exact zone for mixed branch repulsion is described above and 
detailed in Proposition [SJ here we present some simple necessary criteria for 
mixed branch repulsion to occur. First, note that except for the internal edge 
and a (typically small) lower bound on pendant branch lengths, the relevant 
parameters are differences of branch lengths between sets rather than absolute 
branch lengths themselves. Given two branch length sets with edges numbered 
as above, let di denote the difference between the branch lengths for edge i, 
i.e. ti — Si. Then (perhaps after changing the arbitrary numbering of the taxa) 
either d\ > d$ > d$ > > d 2 or d\ > > c?3 > d$ > d 2 must be satisfied in 
order for mixed branch repulsion to occur. Thus, for example, in one set of 
branch lengths the pendant edge for taxa 1 should be long and the pendant 
edge for taxa 2 should be short, while in the other set of branch lengths these 
roles should be reversed. On the other hand, the branch lengths for taxa 3 and 
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4 should be both long for one set and both short for the other. Additionally, 
at least one of the two internal branch lengths needs to be relatively short. 
There are other more complex criteria, but the above is necessary for exact 
mixed branch repulsion to occur. However, as noted below, exact mixed 
branch repulsion is not necessary to "fool" model based methods. 

We believe that this similarity between site pattern frequencies 
generated by mixtures of branch lengths on one tree and corresponding 
unmixed frequencies on a different tree is what is leading to the mixed branch 
repulsion observed in theory and simulation. Furthermore, it is possible that 
even the simple case presented here is directly relevant to reconstructions from 
data. First, it is not uncommon to simplify the genetic code from the four 
standard bases to two (pyrimidines versus purines) in order to reduce the 
effect of compositional bias when working with genome-scale data on deep 
phylogenetic relationships ( Phillips et al. . 2004 ). Seco nd, when working on 



such relationships concatenation of genes is common (|Baldauf et all l2000h . for 



which a phylogenetic mixture is the expected result. Finally, the region of 
parameter space bringing about mixed branch repulsion may become more 
extensive as the number of concatenated genes increases. Therefore in 
concatenated gene analysis it may be worthwhile consideri ng incongruence in 



2003; 



terms of branch len gths and not just in terms of topology (jRokas et al 
Jeffrov et al. ■ l2006h . as highly incongruent branch lengths may produce 
artifactual results upon concatenation. Other methods may be useful in this 
setting, such as gene order data, gene presence/absence, or coalescent-based 
methods to infer the most likely species tree from a collection of gene trees. 

Mixed branch repulsion may be more difficult to detect than the usual 
model mis-specification issues; in the cases presented here the mis-specified 
single tree model fits the data perfectly. In contrast, although using the wrong 
mutation model for reco nstruction using maximu m likelihood can lead to 



incorrect tree topologies (|Goremvkin et all 120051) . the resulting model 



mis-specification can be seen from a poor likelihood score. In the mixtures 
presented here, there is no way of telling when one is in the mixed regime on 
one topology or an unmixed regime on another topology. Furthermore, any 
model selection technique (including likelihood ratio tests, the Akaike 
Information Criterion and the Baycsian Information Criterion) which chooses a 
simple model given equal likelihood scores would, in this case, choose a simple 
unmixed model. Thereby it would select a tree that is different from the 
historically correct tree if the true process was generated by a mixture model. 

The derivation of the zone resulting in mixed branch repulsion is a 
conceptually simple application of two of the pillars of theoretical 
phylogenetics: the Hadamard transform and phylogenetic invariants 
( Hendv and Pennvl [l989t ISemple and Steell . [20031 iFelsenstehl |2004 . The 
Hadamard transform is a closed form invertible transformation (expressed in 
terms of the discrete Fourier transform) for gaining the expected site pattern 
frequencies from the branch lengths and topology of a tree or vice versa. 
Phylogenetic invariants characterize when a set of site pattern frequencies 
could be the expected site pattern frequencies for a tree of a given topology. 
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They are identities in terms of the discrete Fourier transform of the site 
pattern frequencies. Therefore, to derive the above equations, we simply insert 
the Hadamard formulae for the Fourier transform of pattern probabilities into 
the phylogenetic invariants, then check to make sure the resulting branch 
lengths are positive. 

Similar considerations lead to an understanding of when it is possible 
to mix two branch length sets on a tree to reproduce the site pattern 
frequencies of a tree of the same topology (Proposition of Appendix). For a 
quartet, two cases are possible. First, a pair of neighboring pendant branch 
lengths can be equal between the two branch length sets of the mixture. 
Alternatively, the sum of one pair of neighboring pendant branch lengths and 
the difference of the other pair can be equal. For trees larger than quartets, 
the allowable mixtures are determined by these restrictions on the quartets 
(results to appear elsewhere). For pairs of branch lengths satisfying these 
criteria, any choice of mixing weights will produce site pattern frequencies 
satisfying the phylogenetic invariants. 

Intuitively, one might expect that when two sets of branch lengths mix 
to mimic a tree of the same topology, some sort of averaging property would 
hold for the branch lengths. This is true for pairwise distances in the tree but 
need not be the case for individual branches, as demonstrated by Figure 2. In 
fact, it is possible to mix two sets of branch lengths on a tree to mimic a tree 
of the same topology such that a resulting pendant branch length is arbitrarily 
small while the corresponding branch length in either of the branch length sets 
being mixed stays above some arbitrarily large fixed value. 

The results in this paper s hed some light on the geometry of 
phylogenetic mixtures (jKiml . 2000). As is well known, the set of phylogenetic 
trees of a given top ology forms a compact subvarie ty of the space of site 
pattern frequencies (jSturmfels and SullivantLl2005h .' The first part of our work 
demonstrates that there are pairs of points in one such subvariety such that a 
line between those two points intersects a distinct subvariety (see Figure 3). 
Therefore the convex hull of one subvariety has a region of intersection with 
distinct subvarieties. Th i s is str onger than the recently derived result by 
Stefankovic and Vigoda ( 2007allrjh that the convex hulls of the varieties 



intersect. The second part of our work shows that there exist pairs of points in 
a subvariety such that the line between those points intersects the subvariety. 
Furthermore, it demonstrates that when such a line between two points 
intersects the subvariety in a third point, then a subinterval of the line is 
contained in the subvariety. 

This geometric perspective can aid in understanding practical 
problems of phylogenetic estimation. The question of when maximum 
likelihood sel ects the "wrong" topology given mixture data was initiated by 
Chand (Il996h who found a one-parameter sp ace of such examples under the 



two-state symmetric (CFN) model. Recently IStefankovic and Vigodal (|2007al ) 
found a two-parameter space of such examples for the CFN model, and a 
one-dimensional space of examples for the Jukes-Cantor DNA (JC) and 
Kimura two and three parameter (K2P, K3P) models. A potential criticism of 
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these results is that because the set of examples has lower dimension than the 
ambient parameter space one is unlikely to encounter them in practice. 

However, a simple geometric argument can show that the dimension of 
the set of all such pathological examples is equal to the dimension of the 
parameter space for all four of these models. To see why this holds we first 
recall the definition of the Kullback-Leibler divergence of probability 
distribution q from a second distribution p: 

dKh(p,q) = VVlog— . 

i * 

The p vector is typically thought of as a data vector and the q vector is 
typically the model data. Maximum likelihood seeks to find the model data 
vector q which minimizes 5kl(p, Let V 12 |34 be the set of all data vectors 
which correspond exactly to trees of topology 12|34, and similarly for Fi3|24- 
For V — Vi2\3i or V13124 let Skl(p, V) denote the divergence of p from the 
"closest" point in V , i.e. the minimum of 5kl{p, v) where v ranges over V . 
We show in Lemma [5] that this function exists and is continuous across the set 
of probability vectors p with all components positive. 

Now, pick any of the above group-based models, and let y be a 
correspondin g pathological mixture on 12 134 f or that model supplied by 
Theorem 2 of IStefankovic and Vigoda ( 2007a[ ). Maximum likelihood chooses 



topology 13|24 over 12|34 for a data vector p exactly when 5kl(p, ^13124) is less 
than <5 K l(p, V12134), therefore S K h{y, V 13 \ 24: ) < Skl(v, Vi2|34)- b Y the properties 
of continuous functions, this inequality also holds for all probability vectors y' 
close to y which also have all components positive. Therefore ML will choose 
13 1 24 over 12 1 34 for all such y' . Because the transformation taking branch 
length and mixing weight parameters to expected site pattern frequencies is 
continuous, one can change branch lengths and mixing weight arbitrarily by a 
small amount and still have ML choose 13|24 for the resulting data. This gives 
the required full-dimensional space of examples. 

We now indicate how our results fit into previous work on 
identifiability and discuss prospects for generalization. For four-state models 
with extra symmetries such as the Jukes-Cantor DNA model and the Kimura 
two-parameter model it is known that there exist linear phylogenetic 
invariants which imply identifiab ility of the topology for mixture model data 
( Stefankovic and Vigo dal. l2(K17ah . The topology is also identifiable for 



phylogenetic mixtur es in which each underly i ng process is described by an 
infinite state model (IMossel and Steel.1 . 2004; M ossel and Steel l2005h - such 



processes may be relevant to data involving rare (homoplasy-free) genomic 
changes. Therefore th e pathologies observed here could not occur for those 
models. Furthermore, All man and Rhodes! (|2006h have shown generic 



identifiability (i.e. identifiability for "almost all" parameter regimes) when the 
number of states exceeds the number of mixture classes. As stated above, the 
dimension of the set of examples presented here is of dimension two less than 
the ambient space (even though the conditions of the Allman and Rhodes 
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work is not satisfied). However, we note that even when tree topology is 
generically identifiable (but not globally identifiable) for some model, 
arguments similar to the above can show that there exist positive-volume 
regions where the data is closer to that from a tree of a different topology than 
a tree of the same topology. 

A related though distinct question concerns identifiability under 
mixture models when the data partitions are known. For example, we may 
have a number of independent sequence data sets for the same set of taxa, 
perhaps corresponding to different genes. In this setting it may be reasonable 
to assume that the sequence sites within each data set evolve under the same 
branch lengths (perhaps subject to some i.i.d. rates-across-sites distribution), 
but that the branch lengths between the data sets may vary. The underlying 
tree topology may be the same or different across the data sets, however let us 
first consider the case where there is a common underlying topology. In the 
case where each data set consists of sequences of length one we are back in the 
setting of phylogenetic mixtures considered above. However, for longer blocks 
of sequences, we might hope to exploit the knowledge that the sequences 
within each block have evolved under a common mechanism. If the sequence 
length within any one data set becomes large we will be able to infer the 
underlying tree for that data set correctly, so the interesting question is what 
happens when the data sets provide only 'mild' support for their particular 
reconstructed tree. Assume that all (or nearly all) of the data sets contain 
sufficiently many sites so that the tree reconstruction method M positively 
favors the true tree over any particular alternative tree. By this we mean that 
M returns the true tree with a probability that is greater by a factor of at 
least 1 + e (with e > 0) than the probability that M returns each particular 
different tree. Then it is easily shown that a majority rule selection procedure 
applied to the reconstructed trees across the k independent data sets will 
correctly return the true underlying tree topology with a probability that goes 
to 1 as A; grows. Note that this claim holds generally, not just for the two-state 
symmetric model. Of course it is also possible that the underlying tree may 
differ across data sets- in the case of genes perhaps due to lineage sorting 
( Degnan and Rosenbersi 20061 )- in which case the reconstruction question 
becomes more complex. 

In a forthcoming article (Matsen, Mossel, and Steel 2007) we further 
investigate identifiability of mixture models. Using geometric methods we 
make some progress towards understanding how "common" non-identifiable 
mixtures should be for the symmetric and non-symmetric two-state models; 
for mixtures of many trees they appear to be quite common. A new 
combinatorial theorem implies identifiability for certain types of mixture 
models when branch lengths are clock-like. A simple argument shows 
identifiability for rates-across-sites models. We also investigate mixed branch 
repulsion for larger trees. 

Many interesting questions remain. First of all, is exact mixed branch 
repulsion an issue for any nontrivial model on four states? Also, what is the 
zone of parameter space for which a mixture of branch lengths on a tree is 
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closer (in some meaningful way) to the expected site pattern frequencies of a 
tree of different topology than to those for a tree of the original topology? 
How often does mixed branch repulsion present itself given "random" branch 
lengths? Considering the rapid pace of development in this field we do not 
expect these questions to be open for long. 
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Appendix 



In this section we provide more precise statements and proofs of the 
propositions in the text. The proofs will be presented in the reverse order than 
they were stated in the main text — first the fact that it is possible to mix two 
branch lengths on a tree to mimic a tree of the same topology, then that it is 
possible to mix branch lengths to mimic a tree of a distinct topology. 

As stated in the main text, the general strategy of the proofs is 
simple: use the Hadamard transform to calculate Fourier transforms of site 
pattern probabilities and then insert these formulas into the phylogcnctic 
invariants. These steps would become very messy except for a number of 
simplifications: First, because the discrete Fourier transform is linear, a 
transform of a mixture is simply a mixture of the corresponding transforms. 
Second, the fact that the original trees satisfy a set of phylogenetic invariants 
reduces the complexity of the mixed invariants. Finally, the product of the 
exponentials of the branch lengths appear in all formulas, and division leads to 
a substantial simplification. 

First we remind the reader of the main tools and fix notation. Note 
that for the entire paper we will be working with the two-state symmetric 
(also known as Cavender-Farris-Neyman) model. 

The Hadamard transform and phylogenetic invariants 
For a given edge e of branch length 7(e) we will denote 

6(e) = exp(-2 7 (e)) (1) 

which ranges between zero and one for positive branch lengths. We call this 
number the "fidelity" of the edge, as it quantifies the quality of transmission of 
the ancestral state across the edge. For A C {1, . . . , n} of even order, let 
qA = {H n ^ip) A be the Fourier transform of the split p robabilities, where H n is 



the n by n Hadamard matrix (jSemple and Steell . 120031 ) 



Quartet trees will be designated by their splits, i.e. 13|24 refers to a 
quartet with taxa labeled 1 and 3 on one side of the quartet and taxa 2 and 4 
on the other. 

By the first iden tity in the proof of Theorem 8.6.3 of 

(jSemple and Steell . l2003h one can express the Fourier transform of the split 



probabilities in terms of products of fidelities. That is, for any subset 
A C {1, . . . ,n} of even order, 

ia= n 9 ^ ( 2 ) 

e£P(T,A) 

where V(T, A) is the set of edges which lie in the set of edge-disjoint paths 
connecting the taxa in A to each other. This set is uniquely defined (again, see 
(jSemple and Steell . l2003h l. 



From this equation, we can derive values for the fidelities from the 
Fourier transforms of the split probabilities. In particular, it is simple to write 
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out the fidelity of a pendant edge on a quartet. For example, 



I1V5V4 ■ V1V2 N14 qi2 



V 924 

for a tree of topology 12 1 34. In general, we have the following lemma: 

Lemma 1. If a, b, and c are distinct pendant edge labels on a quartet such 
that a and b are adjacent, then the fidelity of a pendant edge a is 



lab Qac 



A similar calculation leads to an analogous lemma for the internal 

edge: 

Lemma 2. The fidelity of the internal edge of an ab\cd quartet tree is 



(lac %d 
(lab (led 



This paper will also make extensive use of the method of phylogenetic 
invariants. These are polynomial identities in the Fourier transform of the 
split probabilities which are satisfied for a given tree topology. Invarian ts are 
understood in a very general setting (see Sturmfels and Sullivant ( 2005h ). 



however here we only require invariants for the simplest case: a quartet tree 
with the two-state symmetric model. In particular, for the quartet tree ab\cd, 
the two phylogenetic invariants are 

Oabcd — <lab Qcd = (3) 

q ac qbd - q a d qbc = 0. (4) 

A ^-vector mimics the Fourier transforms of site pattern frequencies of a 
nontrivial tree exactly when they satisfy the phylogenetic invariants and have 
corresponding edge fidelities (given by Lemmas [I] and [2|) between zero and one. 

This paper is primarily concerned with the following situation: a 
mixture of two sets of branch lengths on a quartet tree which mimics the site 
pattern frequencies of an unmixed tree. We fix the following notation: the two 
branch length sets will be called t, and Si, the corresponding fidelities will be 
called 9i and tpi , and the Fourier transforms of the site pattern frequencies will 
be labeled with q and r, respectively. The internal edge of the quartet will 
carry the label i = 5, and the pendant edges are labeled according to their 
terminal taxa (e.g. i = 2 is the edge terminating in the second taxon). The 
mixing weight will be written a, and we make the convention that the mixture 
will take the U branch length set with probability a time and Si with 
probability 1 — a. 
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Mixtures mimicking a tree of the same topology 

In this section we describe conditions on mixtures such that a 
nontrivial mixture of two branch lengths on 12 1 34 can give the same 
probability distribution as a single tree of the same topology. 

Mixing two branch length sets on a 12|34 quartet tree with the above 
notation leads to the following form of invariant for a resulting tree also of 
topology 12|34: 

(a + 1- oi)(aqi2M + 0-- a)rvm)- ^ 
(a 912 + (1 - a)ri2)(ag 34 +(l - a) r 34 ) = 0. 

Multiplying out terms then collecting, there will be a a 2 (51234 — 912934) term 
which is zero by the phylogenetic invariants for the 12| 34 topology. Similarly, 
the terms with (1 — a) 2 vanish. Dividing by a (1 — a) which we assume to be 
nonzero, equation (T5|) becomes 

<Zl234 + ^1234 - (<7l2?"34 + r 12 q 3 4) = 0. 

Applying invariant ([3]) for the 12 1 34 topology and simplifying leads to the 
following equivalent form of ([5]): 

(<2i2 - r 12 )(q 3 4 - r 3 4) = 0. (6) 
The same sorts of moves lead to the second invariant of the mixed tree: 

913^24 + ri 3 <?24 - (<?14»'23 + fi4<?23) = 0. (7) 

The fact that a doesn't appear in these equations already delivers an 
interesting fact: if a mixture of two branch lengths in this setting satisfy the 
phylogenetic invariants for a single a, then they do so for all a. Geometrically, 
this means if the line between two points on the subvariety cut out by the 
phylogenetic invariants intersects the subvariety non trivially then it sits 
entirely in the subvariety. 

We can gain more insight by considering these equations in terms of 
fidelities. Direct substitution using |2]) into ([6]) gives 

(fi l fe-VA)(^4-W4)=0. 

This equation will be satisfied exactly when the branch lengths satisfy 

t\ + t 2 = si + s 2 or t 3 + f 4 = S3 + s 4 . (8) 

The corresponding substitution into ([7]) and then division by 6 2 9^94ip 2 ip5ip4 
gives after simplification 

\0 2 ip 2 ) \9 4 4,4) 
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This equation will be satisfied exactly when the branch lengths satisfy 



h-t 2 =si- s 2 or i 3 - < 4 = s 3 - s 4 . (9) 
To summarize, 

Proposition 3. The mixture of two 12 134 quartet trees with pendant branch 
lengths ti and Si satisfies the 12 1 34 phylogenetic invariants for the binary 
symmetric model exactly (up to renumbering) when either t\ = s\ and t 2 = s 2 , 
or t\+t 2 = Si + s 2 and t% — t± — S3 — S4. 

As described above this proposition makes no reference to the mixing 
weight a. 

In quartets where t\ — s\ and t 2 — s 2 , the resulting tree will also have 
pendant branch lengths t\ and t 2 : 

Proposition 4. A mixture of two 12|34 quartet trees with branch lengths U 
and Si which satisfies t\ = s\ and t 2 — s 2 will have resulting pendant branch 
lengths for the first and second taxa equal to t\ and t 2 , respectively. 

Proof. Let the fidelity of the edges leading to taxon one and two be denoted 
fii and fi 2 . We have by Lemma [1] with a = 1, b = 2 and c = 3, 



Mi = 



1(7 2 




+ (1 - a)ipitp 2 ) ■ (a6>i6> 5 6> 3 + (1 - a^ip^) 



a0 2 9 5 9 3 + (1 - a)ip 2 ijj 5 i() 3 



This fraction is equal to 6\ after substituting ipi = 9\ and ip 2 = 9 2l which are 
implied by the hypothesis. The same calculation implies that [i 2 = 9 2 . □ 

In the rest of this section we note that anomalous branch lengths can 
emerge from mixtures of trees mimicking a tree of the same topology. 

Proposition 5. It is possible to mix two sets of branch lengths on a tree to 
mimic a tree of the same topology such that one resulting pendant branch 
length is arbitrarily small while the corresponding branch length in either of the 
branch length sets being mixed stays above some arbitrarily large fixed value. 

Proof. To get such an anomalous mixture, set 9\ = ipi, #3 = ip$, #4 = "04 > 
$2 = ~>p5, 65 — if 2, and a = .5. The equations ([H]) and © are satisfied because 
$3 = "03 and 4 = ip^, and therefore £3 = S3 and £4 = S4. This implies that the 
mixture will indeed satisfy the phylogenetic invariants. 

Now, because again the Fourier transform of a mixture is the mixture 
of the Fourier transform, using Lemma [1] and simplifying gives 

= ftlft + ftl 

Now note that by making the ratio 6*2/6*5 small, it is possible to have 
/Lti be close to one although 9\ can be small. This setting corresponds (via |T|)) 
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to the case of the first branch length of the resulting tree to be going to zero 
although the trees used to make the mixture may have long first branch 
lengths. It can be checked by calculations analogous to (jTOJ) that the other 
fidelities of the tree resulting from mixing will be, in order, \fd 2 fil 1 9 3 , 64, 
\Z02@5- These are clearly strictly between zero and one, so the resulting tree 
will have positive branch lengths. □ 



Mixtures mimicking a tree of a different topology 

In this section we answer the question of what branch lengths on a 
quartet can mix to mimic a quartet of a different topology. 

Proposition 6. Let ki, . . . , ki satisfy the following inequalities: 

ki > fc 3 > fc 4 > 1 > k 2 > 0, (11) 
l-fc? x-kl , i-kl l-fel > Q ^ 



k\ k4 k2 k 3 

fci + fc4 . k 2 +k 3 
1+kik^ l + k^k 



Then there exists 7r5 such that for any it§ < k% < tt^ 1 sufficiently close to 
either tt^ or n^ 1 there exists a mixing weight such that for any t\, . . . ,t§ and 
s\,...,s$ satisfying ir^ = exp (— 2(ts + S5)) and ki = exp (— 2(t, — Si)) for 
i = 1, . . . , 5, the corresponding mixture of two 12 1 34 trees will satisfy the 
phylogenetic invariants for a single tree of the 13 1 24 topology. The resulting 
internal branch length is guaranteed to be positive, and the pendant branch 
lengths will be positive as long as the pendant branch lengths being mixed are 
sufficiently large. 

Proof. Let m denote the Fourier transform vector of the site pattern 
frequencies of the mixture. The invariants for a tree of topology 13 1 24 are (by 
© and 0) 

m-1234 - wi 3 m 2 4 = (14) 
mi 2 m 34: - m 14 m 23 = . (15) 

As before, we insert the mixture of the Fourier transforms of the 
pattern frequencies into the invariants. For the first invariant, 

(a + 1 - a) (a q 123 4 + (1 - a) ^234) 

-{a qi 3 + (1 - a) r 13 )(a g 2 4+(l - a) r 24 ) = 0. 

Multiplying, this is equivalent to 

a 2 {qi234 ~ <7l3<? 2 4) 

+a(l - a) (<7i 234 + ri 234 - ((?i 3 r 24 + ri 3 q 24 )) (16) 

+ (1 - a) 2 (ri 234 - ri 3 r 24 ) = 0. 
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A similar calculation with the second invariant leads to 

« 2 (<7l2<734 - 9l4<723) 

+a(l - a) (<7i 2 r 34 + r 12 q 3i - {qur 23 + ruq 23 )) (17) 
+ (1 - a) 2 (r 12 r 34 - r 1A r 2 z) = 0. 

Rather than (fT6|) and (|17p themselves, we can take (fT6|) and the 
difference of (fT6]) and (fT7|) . Because the 9 and r vectors come from a tree with 
topology 12|34, they satisfy 91234 = 912934 and 913924 = 914923 and the 
equivalent equations for the r. Thus the difference of (fi"6|) and (fl~7|) can be 
simplified to (assuming a(l — a) =/= 0) 

91234 + ^1234 - (912^34 + ri2934) 

= ?13f24 + ?"i3924-(9l4r-23 + ^14923). 

We would like to ensure that the tree coming from the mixture has 
nonzero internal branch length. By Lemma [5] this is equivalent to showing that 

TO13 "124 > ™14 TO23- (19) 

Substituting in for the mixture fidelities and simplifying results in 

a 2 (<3i3<724 - 914923) 

+a(l - a) (9i 3 r 24 + ^13924 - (914^23 + ri 4 9 23 )) 
+(1 - a) 2 (r 13 r 24 - r 14 q 23 ) > 0. 

The first and last terms of this expression vanish because the 9 and r satisfy 
the 12| 34 phylogenetic invariants coming from ([3]) and ([4]). Simplifying leads to 

913^24 + ^13924 > 914^23 + ^14923- (20) 

Define ki = ifii/Oi for i = 1, . . . , 5 and p = a/(l — a). Note that 

< 9i < min(fcr 1 , 1) and < ki < oo (21) 

is equivalent to < 0, < 1 and < ipi < 1. Define 

Xl2 = + X13 = hfa + k 2 h 

Xu = hh + k 2 k 3 X1234 = 1 + k 4 k 2 k 3 k 4 . 

Later we will make use of the fact that the \ are invariant under the action of 
the Klein four group. 

Using these definitions, direct substitution using (|2|) into (fT6|) . (fl"8]) . 
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and (|2H)) and some simplification shows that the set of equations 



(22) 



p 2 (l - 9l) + p(xi234 - 05^5X13) 

+ (1-^)(X1234 - 1) = 
X1234 - Xl2 = #5^5 (Xl3 ~ Xu) ( 23 ) 

Xi3 > Xi4 (24) 



is equivalent to equations (fT4")l . ([!!>)) and (TJ2 

Equation (|23p is simply satisfied by setting 

a , X1234 - Xl2 ,„_.. 

= ■ (25) 

X13 - X14 

However, in doing so, we must require that this ratio is strictly between zero 
and one. The fact that it must be less than one can be written 

Xl4 + X1234 < Xl2 + Xl3 (26) 

which by a short calculation is equivalent to (|13j) . Later it will be shown that 
other equations imply that (|25p is greater than zero. 

Assign variables A, B, and C in the standard way such that ((22)) can 
be written Ap 2 + Bp + C. The A and C terms are strictly positive, thus the 
existence of a < p < oo satisfying this equation implies 

B < and B 2 - 4AC > 0. (27) 

On the other hand, ([2T)l implies the existence of a < p < oo satisfying (|22p. 
Note that using ([25]), B < is equivalent to 

X1234 - Xl2 „ „ 

X1234 X13 < 0. 

Xl3 ~ X14 

Multiplying by xi3 — Xi4 which is positive by (|24|) this equation is equivalent to 

X12X13 < X1234X14 (28) 

which by a short calculation is equivalent to (Tl2|) . The conclusion then is that 
the existence of a p > satisfying (12"2")) is equivalent to (fT2^) and S 2 — 4AC > 
given the rest of the invariants. 

Now, (12"4")) and (|28p imply that % 12 < Xi234- Therefore, according to 
(|2"5|) the product ^5^5 is greater than zero given (JM]). For convenience, set 
""5 = ^5^5) which as described is determined by fci, . . . , £4. Now, ^5 being less 
than one and ^5 being less than one are equivalent to 

7T 5 < k 5 < ir^ 1 . (29) 

In summary, the problem of finding branch lengths and a mixing parameter 
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such that the derived variables satisfy (|14j) . (fT5)l and (JTHJ) is equivalent to 
finding ki and 9 t satisfying ([T2] ) , (p |) . ([2T J) . ([23] ) , (]25] ) . ([29 ]) and B 2 — 4AC > 0, 
which can be written 

(X1234 ~ 7T5X13) 2 ~ 4(1 - TTB/AfcXl ~ ^fc 5 )(Xl234 - 1) >0. (30) 

Note that X1234 = tt5Xi3 is impossible using ([23]) and ([28]) . Therefore ([30]) can 
be satisfied while fixing the other variables by taking k§ close to 715 or tt^" 1 
while satisfying ([29]) . 

Now we show that (possibly after relabeling) equation (jlip is 
equivalent to ([2"4"]) in the presence of the other inequalities. Recall that the \ 
are invariant under the action of the Klein group acting on the indices of fc^. 
Because the invariants are equivalent to equations which can be expressed in 
terms of the x with #5 and ips, we can assume that fci > fc 2 and fci > fc 3 by 
renumbering via an element of the Klein group. 

Now, subtract X12X14 from ([28]) to find 

Xl2(Xl3 ^ X14) < (X1234 - Xl2)Xl4- 

Rearranging ([26]) . it is clear that this implies that 

X12 < Xi4- (31) 

Inserting the definition of the \ into (|24|) and ([3T|) shows that these equations 
are equivalent to 

< (fci - fc 2 )(fc 3 - fc 4 ) and < (fci - fc 3 )(fc 4 - fc 2 ). (32) 

We have assumed by symmetry that ki > fc 2 and fci > ^3; now ((32]) shows that 
fci can't be equal to either fc 2 or fc 3 . Also, (|3^|) shows that fc 3 > fc 4 and 
fc 4 > fc 2 . All of these inequalities put together imply that fci > fc 3 > fc 4 > fc 2 , 
which directly implies (|24|) . 

Furthermore, another rearrangement of (|26|) using the inequality (I3ip 
leads to X1234 < Xi3- This after substitution gives (1 — fcifc 3 )(l — fc 2 fc 4 ) < 0, 
which implies that it is impossible for all of the fci to be either less than or 
greater than one. 

Note that (fT2j) excludes the case fci > fc 3 > 1 > fc 4 > fc 2 ; this leaves 
fci > 1 > fc 3 > fc 4 > fc 2 and fci > fc 3 > fc 4 > 1 > fc 2 . We can assume the latter 
without loss of generality by exchanging the 9i and the ipi (which corresponds 
to replacing fcj with fc" 1 ) and renumbering. 

So far we have described how to find values for the branch lengths so 
that the invariants ([3]) and (|4]) and the internal branch length inequality (fT9|) 
are satisfied. However, we also need to check that the resulting pendant 
branch lengths for the tree are positive. Here we describe how this can be 
achieved by taking a lower bound on the values of ti. 

Assume edges a and b are adjacent on the 12|34 trees being mixed, 
and a and c are adjacent on the resulting 13|24 tree. Then, by Lemma[T]and 
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the fidelity of the pendant a edge is 



{a6 a 0b + (1 - a)ip a ipb)(aOa05Oc + (1 - aj^qjM^) 
V a(9 6 6> 5 6> c + (1 - a)ip b xp 5 ipc 

In order to assure that the resulting pendant branch length for edge a is 
positive, we must show that the above fidelity is less than one. This is 
equivalent to showing that 9 a must satisfy 



< / a+ (1 - a)k b k 5 k c 

V (a + (1 - a)k a k b )(a + (1 - a)k a k 5 k c ) 



for all such a, 6, c triples. Thus this equation along with (12ip imply upper 
bounds for 8 a ; by the definition of fidelities these translate to lower bounds for 
t a . This concludes the proof. □ 

Note that the proof actually completely characterizes (up to 
relabeling) the set of branch lengths and mixing weights such that the 
resulting mixture mimics a tree of different topology. 

Proposition 7. If two sets of branch lengths on the 12|34 tree mix to mimic a 
tree of the topology 13|24 then up to relabeling the associated fej must satisfy 
the inequalities ill]) . {U| ), \1S\) . and i29\) : the 9{ must satisfy the inequalities 
i21\) and A33\) . The two required equalities are that the product 9^5 must 
satisfy i25\) , and the associated p must satisfy 



Kullback-Leibler lemma 



Lemma 8. Assume some group-based model G and let A be the probability 
simplex for distributions on four taxa under G. Let V C A be the set of all 
site-pattern frequencies for some quartet tree under G. Then 

8kl(p,V) := mm<5 KL (p,u) 

exists and is continuous for all p in the interior of A. 

Proof. Note that 5kl {p, q) is a continuous function when probability 
distributions p and q have no components zero, i.e. they sit in the interior A 
of the probability simplex A. We will show that for any p G A there exists an 
open neighborhood U of p such that <5kl(p', V) exists and is continuous for all 
p' G U. Given p let p m in be the smallest component pt of p. Let 
U = jf>' G A : p\ > Pmin/2 j . Then choose e > such that 

fog(Pmin/2) + log(l/e) > sup inf S Kh (p',q). 
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The right hand side of this equation is finite (since it is bounded above by 
sup p / eC/ <5kl(p') ?*) for any point q* GV with no components zero). 

Let B = {q &V- qi > £ for alH}. V is a compact set 

( Moulton and Steel , 12004 ) therefore B c A is compact as well. Now for any 
p' S U and q' G V - B 

WpV) = ^iog^ + £>> g (i/^) 

i i 

> l0g(p mi „/2) + log(l/e) 

> inf S KL (p',q) 

qev 

so the infimum cannot be achieved outside B. Consequently, 

inf S K l(p', q) = m.mS K h(p\ q) 

for all p' E U. Thus the right hand side exists; continuity follows from 
standard analytic arguments. □ 
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Tabic 1: Rounded branch lengths for the examples in Figure 1. The top 
division of the table is example (a); the bottom is example (b). The top two 
lines in each division are the branch lengths forming the mixture and the third 
line gives the branch lengths for the unmixed tree. 



weight 


pendant 1 


pendant 2 


pendant 3 


pendant 4 


internal 


0.748646 


1.772261 


0.25 


0.949306 


0.846574 


0.366516 


0.251354 


0.25 


1.353637 


0.4 


0.5 


0.213387 


1. 


0.888101 


0.905792 


0.648625 


0.654236 


0.086051 


0.936064 


1.838398 


0.2 


1.397309 


0.411489 


0.062429 


0.063936 


0.2 


0.543932 


0.2 


0.2 


0.055312 


1. 


1.011471 


0.375718 


0.794529 


0.305338 


0.360827 
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Figure 1: Mixtures of two sets of branch lengths on a tree of a given topology 
can have exactly the same site pattern frequencies as a tree of a different 
topology under the two-state symmetric model. The notation in the diagram 
showing x * T\ + (1 — x) * T x = T2 means that the indicated mixture of the two 
branch lengths sets T\ and T[ shown in the diagram gives the same expected 
site pattern frequencies as the tree T 2 . The diagrams show two examples of 
this "mixed branch repulsion;" the general criteria for such mixtures is 
explained in the text. The branch length scale in the diagrams is given by the 
line segment indicating the length of a branch with 0.5 substitutions per site. 
Note that the mixing weights in this example have been rounded. 
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Figure 2: Mixtures of two sets of branch lengths on a tree of a given topology 
can have exactly the same site pattern frequencies as a tree of the same 
topology under the two-state symmetric model. The criterion for the 
occurrence of this phenomenon is explained in the text and an example is 
shown in the figure. Note in particular that the branch lengths need not 
average: for example, the branch length for the pendant edge leading to taxon 
1 virtually disappears after mixing. 
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Figure 3: A geometric depiction of the main result. The ambient space is a 
projection of the seven-dimensional probability simplex of site pattern 
frequencies for trees on four leaves. The gray sheet is a subset of a 
two-dimensional subvariety of the site pattern frequencies for trees of the 12|34 
topology, while the black sheet is an analogous subset for the 13|24 topology. 
The horizontal line represents the possible mixtures for the two sets of branch 
lengths for the 12 134 topology in Figure la. The fact that these two sets of 
branch lengths can mix to make a tree of topology 13|24 is shown here by the 
fact that the horizontal line intersects the black sheet. 
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